Reducing memory allocation from client-side stats#10705
Reducing memory allocation from client-side stats#10705
Conversation
Introduced DDCache-s around UTF8BytesString construction UTF8BytesString are advantageous for serialization, but that only applies to the key instance that is actually serialized Most key instances here are being created to do a look-up into the map. so the UTF8BytesString creation was extra work. This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR. As is this changer reduces the impact on application throughput by 5-20% depending on the heap size.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1064646
Total [baseline] (8.819 s) : 0, 8818686
Agent [candidate] (1.06 s) : 0, 1060262
Total [candidate] (8.82 s) : 0, 8819663
section iast
Agent [baseline] (1.235 s) : 0, 1234551
Total [baseline] (9.546 s) : 0, 9546095
Agent [candidate] (1.223 s) : 0, 1222589
Total [candidate] (9.545 s) : 0, 9545283
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.194 ms) : 0, 1194
BytebuddyAgent [baseline] (633.089 ms) : 0, 633089
BytebuddyAgent [candidate] (629.829 ms) : 0, 629829
AgentMeter [baseline] (29.433 ms) : 0, 29433
AgentMeter [candidate] (29.029 ms) : 0, 29029
GlobalTracer [baseline] (258.534 ms) : 0, 258534
GlobalTracer [candidate] (257.396 ms) : 0, 257396
AppSec [baseline] (31.905 ms) : 0, 31905
AppSec [candidate] (31.619 ms) : 0, 31619
Debugger [baseline] (59.177 ms) : 0, 59177
Debugger [candidate] (58.779 ms) : 0, 58779
Remote Config [baseline] (592.37 µs) : 0, 592
Remote Config [candidate] (598.32 µs) : 0, 598
Telemetry [baseline] (8.801 ms) : 0, 8801
Telemetry [candidate] (8.69 ms) : 0, 8690
Flare Poller [baseline] (5.76 ms) : 0, 5760
Flare Poller [candidate] (7.128 ms) : 0, 7128
section iast
crashtracking [baseline] (1.209 ms) : 0, 1209
crashtracking [candidate] (1.191 ms) : 0, 1191
BytebuddyAgent [baseline] (802.241 ms) : 0, 802241
BytebuddyAgent [candidate] (793.741 ms) : 0, 793741
AgentMeter [baseline] (11.538 ms) : 0, 11538
AgentMeter [candidate] (11.279 ms) : 0, 11279
GlobalTracer [baseline] (248.577 ms) : 0, 248577
GlobalTracer [candidate] (246.694 ms) : 0, 246694
IAST [baseline] (25.345 ms) : 0, 25345
IAST [candidate] (25.071 ms) : 0, 25071
AppSec [baseline] (26.599 ms) : 0, 26599
AppSec [candidate] (26.24 ms) : 0, 26240
Debugger [baseline] (62.719 ms) : 0, 62719
Debugger [candidate] (62.379 ms) : 0, 62379
Remote Config [baseline] (529.463 µs) : 0, 529
Remote Config [candidate] (517.596 µs) : 0, 518
Telemetry [baseline] (15.243 ms) : 0, 15243
Telemetry [candidate] (14.691 ms) : 0, 14691
Flare Poller [baseline] (4.415 ms) : 0, 4415
Flare Poller [candidate] (4.825 ms) : 0, 4825
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054543
Total [baseline] (11.077 s) : 0, 11076691
Agent [candidate] (1.058 s) : 0, 1057768
Total [candidate] (11.022 s) : 0, 11021907
section appsec
Agent [baseline] (1.246 s) : 0, 1245685
Total [baseline] (11.127 s) : 0, 11127002
Agent [candidate] (1.25 s) : 0, 1249547
Total [candidate] (11.091 s) : 0, 11091410
section iast
Agent [baseline] (1.244 s) : 0, 1243970
Total [baseline] (11.393 s) : 0, 11393016
Agent [candidate] (1.24 s) : 0, 1240433
Total [candidate] (11.369 s) : 0, 11368884
section profiling
Agent [baseline] (1.188 s) : 0, 1188097
Total [baseline] (11.028 s) : 0, 11028081
Agent [candidate] (1.185 s) : 0, 1184848
Total [candidate] (10.974 s) : 0, 10974013
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.185 ms) : 0, 1185
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (625.249 ms) : 0, 625249
BytebuddyAgent [candidate] (628.216 ms) : 0, 628216
AgentMeter [baseline] (29.008 ms) : 0, 29008
AgentMeter [candidate] (29.096 ms) : 0, 29096
GlobalTracer [baseline] (255.861 ms) : 0, 255861
GlobalTracer [candidate] (256.844 ms) : 0, 256844
AppSec [baseline] (31.458 ms) : 0, 31458
AppSec [candidate] (31.458 ms) : 0, 31458
Debugger [baseline] (59.408 ms) : 0, 59408
Debugger [candidate] (59.375 ms) : 0, 59375
Remote Config [baseline] (581.408 µs) : 0, 581
Remote Config [candidate] (584.747 µs) : 0, 585
Telemetry [baseline] (8.592 ms) : 0, 8592
Telemetry [candidate] (8.638 ms) : 0, 8638
Flare Poller [baseline] (7.332 ms) : 0, 7332
Flare Poller [candidate] (6.418 ms) : 0, 6418
section appsec
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.193 ms) : 0, 1193
BytebuddyAgent [baseline] (657.866 ms) : 0, 657866
BytebuddyAgent [candidate] (660.684 ms) : 0, 660684
AgentMeter [baseline] (12.025 ms) : 0, 12025
AgentMeter [candidate] (12.026 ms) : 0, 12026
GlobalTracer [baseline] (258.299 ms) : 0, 258299
GlobalTracer [candidate] (258.998 ms) : 0, 258998
IAST [baseline] (23.977 ms) : 0, 23977
IAST [candidate] (24.032 ms) : 0, 24032
AppSec [baseline] (177.404 ms) : 0, 177404
AppSec [candidate] (177.39 ms) : 0, 177390
Debugger [baseline] (65.46 ms) : 0, 65460
Debugger [candidate] (65.933 ms) : 0, 65933
Remote Config [baseline] (579.964 µs) : 0, 580
Remote Config [candidate] (569.737 µs) : 0, 570
Telemetry [baseline] (8.963 ms) : 0, 8963
Telemetry [candidate] (8.904 ms) : 0, 8904
Flare Poller [baseline] (3.611 ms) : 0, 3611
Flare Poller [candidate] (3.571 ms) : 0, 3571
section iast
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.206 ms) : 0, 1206
BytebuddyAgent [baseline] (808.376 ms) : 0, 808376
BytebuddyAgent [candidate] (805.971 ms) : 0, 805971
AgentMeter [baseline] (11.812 ms) : 0, 11812
AgentMeter [candidate] (11.681 ms) : 0, 11681
GlobalTracer [baseline] (249.83 ms) : 0, 249830
GlobalTracer [candidate] (249.168 ms) : 0, 249168
IAST [baseline] (25.643 ms) : 0, 25643
IAST [candidate] (25.47 ms) : 0, 25470
AppSec [baseline] (26.918 ms) : 0, 26918
AppSec [candidate] (26.713 ms) : 0, 26713
Debugger [baseline] (63.704 ms) : 0, 63704
Debugger [candidate] (64.263 ms) : 0, 64263
Remote Config [baseline] (546.536 µs) : 0, 547
Remote Config [candidate] (528.814 µs) : 0, 529
Telemetry [baseline] (15.13 ms) : 0, 15130
Telemetry [candidate] (14.542 ms) : 0, 14542
Flare Poller [baseline] (4.531 ms) : 0, 4531
Flare Poller [candidate] (4.447 ms) : 0, 4447
section profiling
crashtracking [baseline] (1.168 ms) : 0, 1168
crashtracking [candidate] (1.183 ms) : 0, 1183
BytebuddyAgent [baseline] (686.64 ms) : 0, 686640
BytebuddyAgent [candidate] (685.454 ms) : 0, 685454
AgentMeter [baseline] (8.64 ms) : 0, 8640
AgentMeter [candidate] (8.579 ms) : 0, 8579
GlobalTracer [baseline] (216.414 ms) : 0, 216414
GlobalTracer [candidate] (215.836 ms) : 0, 215836
AppSec [baseline] (32.103 ms) : 0, 32103
AppSec [candidate] (32.175 ms) : 0, 32175
Debugger [baseline] (63.423 ms) : 0, 63423
Debugger [candidate] (62.951 ms) : 0, 62951
Remote Config [baseline] (621.679 µs) : 0, 622
Remote Config [candidate] (582.683 µs) : 0, 583
Telemetry [baseline] (10.643 ms) : 0, 10643
Telemetry [candidate] (10.483 ms) : 0, 10483
Flare Poller [baseline] (3.501 ms) : 0, 3501
Flare Poller [candidate] (3.513 ms) : 0, 3513
ProfilingAgent [baseline] (93.961 ms) : 0, 93961
ProfilingAgent [candidate] (93.241 ms) : 0, 93241
Profiling [baseline] (94.516 ms) : 0, 94516
Profiling [candidate] (93.798 ms) : 0, 93798
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 17 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section baseline
no_agent (18.534 ms) : 18343, 18724
. : milestone, 18534,
appsec (18.553 ms) : 18364, 18742
. : milestone, 18553,
code_origins (17.655 ms) : 17477, 17833
. : milestone, 17655,
iast (18.074 ms) : 17894, 18254
. : milestone, 18074,
profiling (19.382 ms) : 19188, 19575
. : milestone, 19382,
tracing (17.59 ms) : 17419, 17760
. : milestone, 17590,
section candidate
no_agent (18.529 ms) : 18337, 18720
. : milestone, 18529,
appsec (18.929 ms) : 18734, 19123
. : milestone, 18929,
code_origins (17.948 ms) : 17769, 18127
. : milestone, 17948,
iast (18.022 ms) : 17842, 18201
. : milestone, 18022,
profiling (19.507 ms) : 19309, 19704
. : milestone, 19507,
tracing (17.708 ms) : 17533, 17883
. : milestone, 17708,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section baseline
no_agent (1.164 ms) : 1153, 1176
. : milestone, 1164,
iast (3.159 ms) : 3114, 3204
. : milestone, 3159,
iast_FULL (5.902 ms) : 5843, 5962
. : milestone, 5902,
iast_GLOBAL (3.382 ms) : 3322, 3442
. : milestone, 3382,
profiling (2.101 ms) : 2082, 2121
. : milestone, 2101,
tracing (1.813 ms) : 1798, 1828
. : milestone, 1813,
section candidate
no_agent (1.167 ms) : 1156, 1178
. : milestone, 1167,
iast (3.175 ms) : 3132, 3218
. : milestone, 3175,
iast_FULL (6.061 ms) : 5999, 6123
. : milestone, 6061,
iast_GLOBAL (3.684 ms) : 3614, 3754
. : milestone, 3684,
profiling (1.989 ms) : 1969, 2010
. : milestone, 1989,
tracing (1.748 ms) : 1734, 1762
. : milestone, 1748,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section baseline
no_agent (15.016 s) : 15016000, 15016000
. : milestone, 15016000,
appsec (15.023 s) : 15023000, 15023000
. : milestone, 15023000,
iast (18.42 s) : 18420000, 18420000
. : milestone, 18420000,
iast_GLOBAL (17.755 s) : 17755000, 17755000
. : milestone, 17755000,
profiling (14.737 s) : 14737000, 14737000
. : milestone, 14737000,
tracing (15.072 s) : 15072000, 15072000
. : milestone, 15072000,
section candidate
no_agent (15.57 s) : 15570000, 15570000
. : milestone, 15570000,
appsec (15.044 s) : 15044000, 15044000
. : milestone, 15044000,
iast (18.102 s) : 18102000, 18102000
. : milestone, 18102000,
iast_GLOBAL (17.902 s) : 17902000, 17902000
. : milestone, 17902000,
profiling (14.748 s) : 14748000, 14748000
. : milestone, 14748000,
tracing (15.115 s) : 15115000, 15115000
. : milestone, 15115000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d9ea1af006, baseline=1.61.0-SNAPSHOT~0ce91539df
dateFormat X
axisFormat %s
section baseline
no_agent (1.475 ms) : 1464, 1487
. : milestone, 1475,
appsec (3.759 ms) : 3542, 3976
. : milestone, 3759,
iast (2.259 ms) : 2190, 2328
. : milestone, 2259,
iast_GLOBAL (2.307 ms) : 2236, 2377
. : milestone, 2307,
profiling (2.088 ms) : 2033, 2143
. : milestone, 2088,
tracing (2.079 ms) : 2025, 2133
. : milestone, 2079,
section candidate
no_agent (1.475 ms) : 1463, 1486
. : milestone, 1475,
appsec (3.751 ms) : 3534, 3968
. : milestone, 3751,
iast (2.259 ms) : 2189, 2328
. : milestone, 2259,
iast_GLOBAL (2.304 ms) : 2234, 2373
. : milestone, 2304,
profiling (2.518 ms) : 2352, 2684
. : milestone, 2518,
tracing (2.086 ms) : 2031, 2140
. : milestone, 2086,
|
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = DDCaches.newFixedSizeCache(4); |
There was a problem hiding this comment.
This should be already filled with UTF8BytesString. It can take values of the instrumentation name (that's should also be UTF8BytesString) so I suggest removing caching for this specific one. 4, is also too small
There was a problem hiding this comment.
The lookup logic accounts for receiving a UTF8BytesString and in that case bypasses the cache entirely.
Also, I erred on the side of keeping the caches small. I just want to eliminate most of the allocation - not all of the allocation.
That said, I may have misunderstood what "serviceSource" is so maybe that one should be larger. If it is instrumentations, then maybe we should increase it to 16. (The thinking being that only a few instrumentations are typically active in a given system.)
There was a problem hiding this comment.
If it can only be a UTF8BytesString, we could also tighten the type in the constructor. Although in that case, the JIT would dead code eliminate using the cache anyway (except for the initial allocation).
| static final DDCache<String, UTF8BytesString> TYPE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> KIND_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_METHOD_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_ENDPOINT_CACHE = DDCaches.newFixedSizeCache(32); |
There was a problem hiding this comment.
Would it be possible to widen those a bit? Also, for the http endpoint, I'm wondering if we should just do that caching earlier to make other serialisation benefitting of this (i.e. in EndpointResolver)
There was a problem hiding this comment.
I erred on the side of making the caches small.
The thinking is any object reuse is an improvement over what we were doing, but consuming too much memory could be harmful.
| } | ||
|
|
||
| static UTF8BytesString utf8(DDCache<String, UTF8BytesString> cache, CharSequence charSeq) { | ||
| if ( charSeq == null ) { |
There was a problem hiding this comment.
There are things that are supposing to check nullity of this. Now if we replace with empty is not more the same semantic and this will break some code. I suggest to let the caller return the default value so it won't break the existing logic
There was a problem hiding this comment.
Nevermind - I see that I did accidentally change the semantics for method & endpoint. I'll fix that.
I kept the semantics that already existed. If you look at the prior code, null was replaced with EMPTY, so I did the same.
There was a problem hiding this comment.
well there are things that are OK to return empty but others needs to be literally null (i.e. service source, httpMethod, httpEndpoint)
There was a problem hiding this comment.
Yes, I realized after my initial reply. I'll double check all of them.
There was a problem hiding this comment.
Since the split of EMPTY vs null was about 50 / 50, I decided to just restore the null checks in the constructor.
- fix null handling for http method & endpoint - increase service source cache size
Since it about a 50/50 split between cases that use EMPTY vs null, when a null is passed I decided to just put the null handling back into the constructor. That makes the choice more explicit, and makes the PR easier to review / compare to the prior logic
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = |
There was a problem hiding this comment.
That field does not need to be cached, for two main reasons:
- It is already either a
UTF8BytesStringornull. In fact, it comes from thecomponent()method, which in helpers typically returns aUTF8BytesString. - Its cardinality matches that of the instrumentations, the size of the cache is too small.
The constructor calling UTF8BytesString.create() may look confusing at first glance, but in this case it does not allocate a new object — it simply returns the existing instance. It was kept for consistency with the usual pattern.
Unless there are strong counterarguments, I would suggest removing the cache for this field.
There was a problem hiding this comment.
I think the key part here is "typically returns a UTF8BytesString".
The static typing doesn't currently guarantee it.
And the caching code accounts for the case of receiving a UTF8BytesString by bypassing the cache. In the same manner as UTF8BytesString.create did previously.
When a UTF8BytesString is passed, the cache incurs no overhead other than the static overhead of the cache array itself. But in the case where something other than UTF8BytesString is passed, the cache can significantly reduce memory consumption.
I think that's important because we need to curtail the worst case outcomes.
As for the size, the cardinality doesn't need to match total instrumentations. That would be too large. The size just needs to accommodate the active span producing instrumentations. As I said, I erred on the small size, since I want to avoid the worst case of consuming a lot of static memory.
And assuming a UTF8BytesString is introducing a subtle form of coupling that could easily be compromised by someone later on.
amarziali
left a comment
There was a problem hiding this comment.
Thanks for that improvement. wrt service name source, as per our chat, we might simplify trying changing the signature of that constructor if possible and shortcut the cache if we ensure that every value is already an UTF8BytesString itself.
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings. Use
The expected merge time in Use ⏳ Processing |
What Does This Do
Introduces DDCache-s around UTF8BytesString construction
Motivation
UTF8BytesString are advantageous for serialization, but that only applies to the MetricKey instance that is actually serialized
Most MetricKey instances are only created to do a look-up into the map. so the UTF8BytesString creation was just extra work
This change reduces allocation by 6% and GC time by 7% in span creation stress test.
This change reduces impact on application throughput by 5-20% depending on heap size.
Additional Notes
This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.